{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f22a409c18ef"
      },
      "source": [
        "# Summarize large documents using LangChain and Gemini"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "awKO767lQIWh"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_Summarization_WebLoad.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_Summarization_WebLoad.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f892e8b2c8ef"
      },
      "source": [
        "## Overview\n",
        "\n",
        "[Gemini](https://ai.google.dev/models/gemini) is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input.\n",
        "\n",
        "[LangChain](https://www.langchain.com/) is a framework designed to make integration of Large Language Models (LLM) like Gemini easier for applications.\n",
        "\n",
        "In this notebook, you'll learn how to create an application to summarize large documents using Gemini and LangChain.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iHj4T7hsx1EB"
      },
      "source": [
        "## Setup\n",
        "\n",
        "First, you must install the packages and set the necessary environment variables.\n",
        "\n",
        "### Installation\n",
        "\n",
        "Install LangChain's Python library, `langchain`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yERdO0eFJpb-"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m803.1/803.1 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m9.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m205.7/205.7 kB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.7/46.7 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "!pip install --quiet langchain"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "45MVc1stzNUN"
      },
      "source": [
        "Install LangChain's integration package for Gemini, `langchain-google-genai`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FcMXJTN5JsfU"
      },
      "outputs": [],
      "source": [
        "!pip install --quiet langchain-google-genai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ycFMUTxn0VoI"
      },
      "source": [
        "### Grab an API Key\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e1dZNWvUzksX"
      },
      "source": [
        "To use Gemini you need an *API key*. You can create an API key with one click in [Google AI Studio](https://makersuite.google.com/).\n",
        "After creating the API key, you can either set an environment variable named `GOOGLE_API_KEY` to your API Key or pass the API key as an argument when creating the `ChatGoogleGenerativeAI` LLM using `LangChain`.\n",
        "\n",
        "In this tutorial, you will set the environment variable `GOOGLE_API_KEY` to configure Gemini to use your API key."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1RO5jvTTddtc"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Gemini API Key:··········\n"
          ]
        }
      ],
      "source": [
        "# Run this cell and paste the API key in the prompt\n",
        "import os\n",
        "import getpass\n",
        "\n",
        "os.environ['GOOGLE_API_KEY'] = getpass.getpass('Gemini API Key:')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i7wgsoiz418u"
      },
      "source": [
        "## Summarize text\n",
        "\n",
        "In this tutorial, you are going to summarize the text from a website using the Gemini model integrated through LangChain.\n",
        "\n",
        "You'll perform the following steps to achieve the same:\n",
        "1. Read and parse the website data using LangChain.\n",
        "2. Chain together the following:\n",
        "    * A prompt for extracting the required input data from the parsed website data.\n",
        "    * A prompt for summarizing the text using LangChain.\n",
        "    * An LLM model (Gemini) for prompting.\n",
        "\n",
        "3. Run the created chain to prompt the model for the summary of the website data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RhL92-zmSB6Z"
      },
      "source": [
        "### Import the required libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rAv0UicpKARZ"
      },
      "outputs": [],
      "source": [
        "from langchain import PromptTemplate\n",
        "from langchain.document_loaders import WebBaseLoader\n",
        "from langchain.schema import StrOutputParser\n",
        "from langchain.schema.prompt_template import format_document"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4tKpRvmMRX23"
      },
      "source": [
        "### Read and parse the website data\n",
        "\n",
        "LangChain provides a wide variety of document loaders. To read the website data as a document, you will use the `WebBaseLoader` from LangChain.\n",
        "\n",
        "To know more about how to read and parse input data from different sources using the document loaders of LangChain, read LangChain's [document loaders guide](https://python.langchain.com/docs/integrations/document_loaders)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TTgmyxXzKCSq"
      },
      "outputs": [],
      "source": [
        "loader = WebBaseLoader(\"https://blog.google/technology/ai/google-gemini-ai/#sundar-note\")\n",
        "docs = loader.load()\n",
        "\n",
        "print(docs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4xlf_F_4B6lB"
      },
      "source": [
        "### Initialize Gemini\n",
        "\n",
        "You must import the `ChatGoogleGenerativeAI` LLM from LangChain to initialize your model.\n",
        " In this example you will use **gemini-pro**, as it supports text summarization. To know more about the text model, read Google AI's [language documentation](https://ai.google.dev/models/gemini).\n",
        "\n",
        "You can configure the model parameters such as ***temperature*** or ***top_p***,  by passing the appropriate values when creating the `ChatGoogleGenerativeAI` LLM.  To learn more about the parameters and their uses, read Google AI's [concepts guide](https://ai.google.dev/docs/concepts#model_parameters)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WWA9F0ZqB-8k"
      },
      "outputs": [],
      "source": [
        "from langchain_google_genai import ChatGoogleGenerativeAI\n",
        "\n",
        "# If there is no env variable set for API key, you can pass the API key\n",
        "# to the parameter `google_api_key` of the `ChatGoogleGenerativeAI` function:\n",
        "# `google_api_key=\"key\"`.\n",
        "\n",
        "llm = ChatGoogleGenerativeAI(model=\"gemini-pro\",\n",
        "                 temperature=0.7, top_p=0.85)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6TECDzaUSTvS"
      },
      "source": [
        "### Create prompt templates\n",
        "\n",
        "You'll use LangChain's [PromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/) to generate prompts for summarizing the text.\n",
        "\n",
        "To summarize the text from the website, you will need the following prompts.\n",
        "1. Prompt to extract the data from the output of `WebBaseLoader`, named `doc_prompt`\n",
        "2. Prompt for the LLM model (Gemini) to summarize the extracted text, named `llm_prompt`.\n",
        "\n",
        "In the `llm_prompt`, the variable `text` will be replaced later by the text from the website."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rixvvvaNKLe_"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "input_variables=['text'] template='Write a concise summary of the following:\\n\"{text}\"\\nCONCISE SUMMARY:'\n"
          ]
        }
      ],
      "source": [
        "# To extract data from WebBaseLoader\n",
        "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
        "\n",
        "# To query Gemini\n",
        "llm_prompt_template = \"\"\"Write a concise summary of the following:\n",
        "\"{text}\"\n",
        "CONCISE SUMMARY:\"\"\"\n",
        "llm_prompt = PromptTemplate.from_template(llm_prompt_template)\n",
        "\n",
        "print(llm_prompt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-wPBMFyISh13"
      },
      "source": [
        "### Create a Stuff documents chain\n",
        "\n",
        "LangChain provides [Chains](https://python.langchain.com/docs/modules/chains/) for chaining together LLMs with each other or other components for complex applications. You will create a **Stuff documents chain** for this application. A **Stuff documents chain** lets you combine all the documents, insert them into the prompt and pass that prompt to the LLM.\n",
        "\n",
        "You can create a Stuff documents chain using the [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language).\n",
        "\n",
        "To learn more about different types of document chains, read LangChain's [chains guide](https://python.langchain.com/docs/modules/chains/document/)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EMZomQdyKMr5"
      },
      "outputs": [],
      "source": [
        "# Create Stuff documents chain using LCEL.\n",
        "# This is called a chain because you are chaining\n",
        "# together different elements with the LLM.\n",
        "# In the following example, to create stuff chain,\n",
        "# you will combine content, prompt, LLM model and\n",
        "# output parser together like a chain using LCEL.\n",
        "#\n",
        "# The chain implements the following pipeline:\n",
        "# 1. Extract data from documents and save to variable `text`.\n",
        "# 2. This `text` is then passed to the prompt and input variable\n",
        "#    in prompt is populated.\n",
        "# 3. The prompt is then passed to the LLM (Gemini).\n",
        "# 4. Output from the LLM is passed through an output parser\n",
        "#    to structure the model response.\n",
        "\n",
        "stuff_chain = (\n",
        "    # Extract data from the documents and add to the key `text`.\n",
        "    {\n",
        "        \"text\": lambda docs: \"\\n\\n\".join(\n",
        "            format_document(doc, doc_prompt) for doc in docs\n",
        "        )\n",
        "    }\n",
        "    | llm_prompt         # Prompt for Gemini\n",
        "    | llm                # Gemini function\n",
        "    | StrOutputParser()  # output parser\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5L0Tvk_5eQzC"
      },
      "source": [
        "### Prompt the model\n",
        "\n",
        "To generate the summary of the  the website data, pass the documents extracted using the `WebBaseLoader` (`docs`) to `invoke()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k9_GxkA5ePRR"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "\"Google introduces Gemini, its most capable AI model yet. Gemini is multimodal, flexible, and optimized for different sizes. It surpasses state-of-the-art performance on various benchmarks, including text, coding, and multimodal tasks. Gemini's capabilities include sophisticated reasoning, understanding text, images, audio, and advanced coding. It is designed with responsibility and safety at its core, undergoing comprehensive safety evaluations and incorporating safety classifiers. Gemini is being rolled out across Google products, including Bard, Pixel, Search, and Ads. Developers and enterprise customers can access Gemini Pro via the Gemini API. Gemini Ultra will be available to select partners and experts for early experimentation before a broader release. Gemini represents a new era of AI innovation, with future versions expected to advance planning, memory, and context processing capabilities.\""
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "stuff_chain.invoke(docs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nfrBsxUFgZzc"
      },
      "source": [
        "# Conclusion\n",
        "\n",
        "That's it. You have successfully created an LLM application to summarize text using LangChain and Gemini."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "Gemini_LangChain_Summarization_WebLoad.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
